Skip to content

ci: fix Check job status to detect skipped results (fixes #2208)#2210

Merged
rwgk merged 5 commits into
NVIDIA:mainfrom
leofang:leofang/fix-issue-2208-gating-check
Jun 16, 2026
Merged

ci: fix Check job status to detect skipped results (fixes #2208)#2210
rwgk merged 5 commits into
NVIDIA:mainfrom
leofang:leofang/fix-issue-2208-gating-check

Conversation

@leofang

@leofang leofang commented Jun 13, 2026

Copy link
Copy Markdown
Member

Fixes #2208.

Replaces the cancelled || failure predicate in .github/workflows/ci.yml's checks: job with CCCL's check_result pattern: per-dep expected="success", with expected="skipped" for legitimate [doc-only] skips, and an early short-circuit for [no-ci].

Reference: https://github.com/NVIDIA/cccl/blob/8c0e6cb1b6412d7bd0070f9ac3f55fa80231961a/.github/workflows/ci-workflow-pull-request.yml#L463-L526

End-to-end validation (pre-fix vs post-fix reproducer): #2210 (comment)

…esult

Fixes NVIDIA#2208. The previous `Check job status` body only treated a dep's
`result == 'cancelled' || == 'failure'` as failure, letting `'skipped'`
slip through silently. When a `build-*` job fails, the dependent
`test-*` job is set to `'skipped'` by default needs-failure
propagation, and the aggregator passes -- exactly the case demonstrated
by NVIDIA#2209.

Adopt CCCL's `check_result` pattern: explicit `expected="success"` per
dependency, with `expected="skipped"` for legitimate `[doc-only]` skips,
and an early short-circuit for `[no-ci]`. Now any deviation from the
expected status (including `'skipped'` from a failed upstream) fails
the aggregator.

Reference: NVIDIA/cccl ci-workflow-pull-request.yml L463-L526.
@copy-pr-bot

copy-pr-bot Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the CI/CD CI/CD infrastructure label Jun 13, 2026
@leofang

leofang commented Jun 14, 2026

Copy link
Copy Markdown
Member Author

/ok to test 170c799

@github-actions

This comment has been minimized.

@leofang

leofang commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

/ok to test 739c499

@leofang leofang added bug Something isn't working P0 High priority - Must do! labels Jun 15, 2026
@leofang leofang added this to the cuda.core v1.1.0 milestone Jun 15, 2026
leofang added 2 commits June 16, 2026 01:36
Mirrors the NVIDIA#2209 reproducer on this branch's new aggregator. Expected
outcome:
- All Build * matrix entries fail at the env-vars build step.
- Downstream Test * jobs are skipped (cascaded needs-failure).
- Check job status now reports failure -- the symmetric, post-fix
  counterpart to NVIDIA#2209 (where Check job status reported success despite
  the same set of failures).

DO NOT MERGE while this commit is present. Remove before merge.

Refs NVIDIA#2208, NVIDIA#2209.
@leofang

leofang commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

/ok to test 811388c

@leofang

leofang commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

Gating-check fix validated end-to-end

The new aggregator catches the exact failure mode that motivated #2208. Demonstrated by running the #2209 reproducer (intentional exit 7 in ci/tools/env-vars) against this branch:

Same reproducer, opposite verdict. The skipped-as-success hole is closed.

The demo commit 811388c99e will be reverted next so this PR returns to a mergeable state.

@leofang

leofang commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

/ok to test 7d31a04

@leofang leofang self-assigned this Jun 16, 2026
@leofang leofang marked this pull request as ready for review June 16, 2026 02:14

@rwgk rwgk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and codex gpt-5.5:

The new check_result approach is a clear improvement over the previous cancelled || failure checks: it closes the skipped-as-success hole while keeping the intentional [doc-only] and [no-ci] cases explicit.

The one subtle behavior change I noticed is that fork or non-NVIDIA repo runs may now report a red final aggregator where jobs are skipped by owner-gated conditions. I agree that is not a concern for this PR, and it is preferable to avoid weakening the aggregator in a way that could reintroduce false-green CI.

@rwgk rwgk merged commit 671af69 into NVIDIA:main Jun 16, 2026
203 of 206 checks passed
@github-actions

Copy link
Copy Markdown
Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI/CD CI/CD infrastructure P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: "Check job status" aggregator passes when build jobs fail (skipped-as-success regression)

2 participants